Exploration on Effectiveness and Efficiency of Similar Sentence Matching
نویسندگان
چکیده
Similar sentence matching is an essential issue for many applications, such as text summarization, image extraction, social media retrieval, question-answer model, and so on. A number of studies have investigated this issue in recent years. Most of such techniques focus on effectiveness issues but only a few focus on efficiency issues. In this paper, we address both effectiveness and efficiency in the sentence similarity matching. For a given sentence collection, we determine how to effectively and efficiently identify the top-k semantically similar sentences to a query. To achieve this goal, we first study several representative sentence similarity measurement strategies, based on which we deliberately choose the optimal ones through cross validation and dynamically weight tuning. The experimental evaluation demonstrates the effectiveness of our strategy. Moreover, from the efficiency aspect, we introduce several optimization techniques to improve the performance of the similarity computation. The trade-off between the effectiveness and efficiency is further explored by conducting extensive experiments.
منابع مشابه
Exploration of Term Dependence in Sentence Retrieval
This paper focuses on the exploration of term dependence in the application of sentence retrieval. The adjacent terms appearing in query are assumed to be related with each other. These assumed dependences among query terms will be further validated for each sentence and sentences, which present strong syntactic relationship among query terms, are considered more relevant. Experimental results ...
متن کاملImproving fuzzy matching through syntactic knowledge
Fuzzy matching in translation memories (TM) is mostly string-based in current CAT tools. These tools look for TM sentences highly similar to an input sentence, using edit distance to detect the differences between sentences. Current CAT tools use limited or no linguistic knowledge in this procedure. In the recently started SCATE project, which aims at improving translators’ efficiency, we apply...
متن کاملOn the computational complexity of finding a minimal basis for the guess and determine attack
Guess-and-determine attack is one of the general attacks on stream ciphers. It is a common cryptanalysis tool for evaluating security of stream ciphers. The effectiveness of this attack is based on the number of unknown bits which will be guessed by the attacker to break the cryptosystem. In this work, we present a relation between the minimum numbers of the guessed bits and uniquely restricted...
متن کاملEvaluation of effective factors in window optimization of fry analysis to identify mineralization pattern: Case study of Bavanat region, Iran
The known ore deposits and mineralization trends are important key exploration criteria in mineral exploration within a specific region. Fry analysis has conventionally been considered as a suitable method to determine the mineralization trends related to linear structures. Based upon literature sources, to date, no investigation has been carried out that includes the Sensitivity Analysis of Fe...
متن کاملImproving Translation Memory with Word Alignment Information
This paper describes a generalized translation memory system, which takes advantage of sentence level matching, sub-sentential matching, and pattern-based machine translation technologies. All of the three techniques generate translation suggestions with the assistance of word alignment information. For the sentence level matching, the system generates the translation suggestion by modifying th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Polibits
دوره 47 شماره
صفحات -
تاریخ انتشار 2013